202 research outputs found
Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics
A common assumption when training embodied agents is that the impact of
taking an action is stable; for instance, executing the "move ahead" action
will always move the agent forward by a fixed distance, perhaps with some small
amount of actuator-induced noise. This assumption is limiting; an agent may
encounter settings that dramatically alter the impact of actions: a move ahead
action on a wet floor may send the agent twice as far as it expects and using
the same action with a broken wheel might transform the expected translation
into a rotation. Instead of relying that the impact of an action stably
reflects its pre-defined semantic meaning, we propose to model the impact of
actions on-the-fly using latent embeddings. By combining these latent action
embeddings with a novel, transformer-based, policy head, we design an Action
Adaptive Policy (AAP). We evaluate our AAP on two challenging visual navigation
tasks in the AI2-THOR and Habitat environments and show that our AAP is highly
performant even when faced, at inference-time with missing actions and,
previously unseen, perturbed action space. Moreover, we observe significant
improvement in robustness against these actions when evaluating in real-world
scenarios.Comment: 21 pages, 17 figures, ICLR 202
The impact of giant jellyfish Nemopilema nomurai blooms on plankton communities in a temperate marginal sea
Abstract(#br)This study focused on the bloom-developing process of the giant jellyfish, Nemopilema nomurai , on phytoplankton and microzooplankton communities. Two repeated field observations on the jellyfish bloom were conducted in June 2012 and 2014 in the southern Yellow Sea where blooms of N . nomurai were frequently observed. We demonstrated that the bloom was made up of two stages, namely the developing stage and the mature stage. Total chlorophyll a increased and the concentrations of inorganic nutrients decreased during the developing stage, while both concentrations maintained stable and at low levels during the mature stage. Our analysis revealed that phosphate excreted by growing N . nomurai promoted the growth of phytoplankton at the developing stage. At the mature stage, size compositions of microzooplankton were altered and tended to be smaller via a top-down process, while phytoplankton compositions, affected mainly through a bottom-up process, shifted to be less diatoms and cryptophytes but more dinoflagellates
Poet: Product-oriented Video Captioner for E-commerce
In e-commerce, a growing number of user-generated videos are used for product
promotion. How to generate video descriptions that narrate the user-preferred
product characteristics depicted in the video is vital for successful
promoting. Traditional video captioning methods, which focus on routinely
describing what exists and happens in a video, are not amenable for
product-oriented video captioning. To address this problem, we propose a
product-oriented video captioner framework, abbreviated as Poet. Poet firstly
represents the videos as product-oriented spatial-temporal graphs. Then, based
on the aspects of the video-associated product, we perform knowledge-enhanced
spatial-temporal inference on those graphs for capturing the dynamic change of
fine-grained product-part characteristics. The knowledge leveraging module in
Poet differs from the traditional design by performing knowledge filtering and
dynamic memory modeling. We show that Poet achieves consistent performance
improvement over previous methods concerning generation quality, product
aspects capturing, and lexical diversity. Experiments are performed on two
product-oriented video captioning datasets, buyer-generated fashion video
dataset (BFVD) and fan-generated fashion video dataset (FFVD), collected from
Mobile Taobao. We will release the desensitized datasets to promote further
investigations on both video captioning and general video analysis problems.Comment: 10 pages, 3 figures, to appear in ACM MM 2020 proceeding
Uncertainty-based Traffic Accident Anticipation with Spatio-Temporal Relational Learning
Traffic accident anticipation aims to predict accidents from dashcam videos
as early as possible, which is critical to safety-guaranteed self-driving
systems. With cluttered traffic scenes and limited visual cues, it is of great
challenge to predict how long there will be an accident from early observed
frames. Most existing approaches are developed to learn features of
accident-relevant agents for accident anticipation, while ignoring the features
of their spatial and temporal relations. Besides, current deterministic deep
neural networks could be overconfident in false predictions, leading to high
risk of traffic accidents caused by self-driving systems. In this paper, we
propose an uncertainty-based accident anticipation model with spatio-temporal
relational learning. It sequentially predicts the probability of traffic
accident occurrence with dashcam videos. Specifically, we propose to take
advantage of graph convolution and recurrent networks for relational feature
learning, and leverage Bayesian neural networks to address the intrinsic
variability of latent relational representations. The derived uncertainty-based
ranking loss is found to significantly boost model performance by improving the
quality of relational features. In addition, we collect a new Car Crash Dataset
(CCD) for traffic accident anticipation which contains environmental attributes
and accident reasons annotations. Experimental results on both public and the
newly-compiled datasets show state-of-the-art performance of our model. Our
code and CCD dataset are available at https://github.com/Cogito2012/UString.Comment: Accepted by ACM MM 202
- …